Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds hostname information to the system information captured during evaluation runs, which helps identify and isolate issues with specific GPUs or nodes. The implementation uses Python's socket.gethostname() to capture the machine's hostname and includes it in evaluation reports.
Key changes:
- Adds
hostnamefield to theSystemInfodataclass - Captures hostname using
socket.gethostname()inmake_system_info() - Displays hostname in generated system information reports
- Also adds
device_countto report output (previously not displayed) - Includes an unrelated error message formatting change
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/libkernelbot/run_eval.py |
Adds socket import, hostname field to SystemInfo dataclass, and captures hostname in make_system_info() |
src/libkernelbot/report.py |
Adds both device_count and hostname to system information output in generate_system_info() |
src/libkernelbot/submission.py |
Fixes error message string concatenation by removing redundant f-string prefix, but introduces spacing issue |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Running on: | ||
| * GPU: `{system.gpu}` | ||
| * CPU: `{system.cpu}` | ||
| * Device count: `{system.device_count}` |
There was a problem hiding this comment.
The addition of device_count to the report output is not mentioned in the PR description and appears to be an undocumented change. Consider adding this to the PR description or splitting it into a separate PR if it's intended behavior. This also lacks test coverage - the test_generate_system_info test should be updated to verify device_count appears in the output.
| * Runtime: `{system.runtime}` | ||
| * Platform: `{system.platform}` | ||
| * Torch: `{system.torch}` | ||
| * Hostname: `{system.hostname}` |
There was a problem hiding this comment.
The new hostname field in the system information output lacks test coverage. The test_generate_system_info test in tests/test_report.py should be updated to include hostname in the sample_system_info fixture and verify it appears in the generated output.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
basically what it says, can help us isolate faulty gpus/nodes